-
Notifications
You must be signed in to change notification settings - Fork 203
feat: Salmon-tximport meta-wrapper pathvars update #4665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThe changes introduce a parameterized path configuration system for the salmon_tximport workflow. A new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Review requires verifying path placeholder consistency across multiple interdependent rules, confirming test configuration mappings align with meta.yaml definitions, validating placeholder syntax and output directory structure changes, and ensuring test resources remain compatible with updated FASTA formats. Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (8)
meta/bio/salmon_tximport/test/reads/S1.fastq.gzis excluded by!**/*.gzmeta/bio/salmon_tximport/test/reads/S1_R1.fq.gzis excluded by!**/*.gzmeta/bio/salmon_tximport/test/reads/S1_R2.fq.gzis excluded by!**/*.gzmeta/bio/salmon_tximport/test/reads/S2.fastq.gzis excluded by!**/*.gzmeta/bio/salmon_tximport/test/reads/S2_R1.fq.gzis excluded by!**/*.gzmeta/bio/salmon_tximport/test/reads/S2_R2.fq.gzis excluded by!**/*.gzmeta/bio/salmon_tximport/test/reads/S3.fastq.gzis excluded by!**/*.gzmeta/bio/salmon_tximport/test/reads/S4.fastq.gzis excluded by!**/*.gz
📒 Files selected for processing (8)
meta/bio/salmon_tximport/meta.yaml(1 hunks)meta/bio/salmon_tximport/meta_wrapper.smk(5 hunks)meta/bio/salmon_tximport/test/Snakefile(1 hunks)meta/bio/salmon_tximport/test/config.yaml(1 hunks)meta/bio/salmon_tximport/test/resources/genome.fasta.fai(1 hunks)meta/bio/salmon_tximport/test/resources/transcriptome.fasta(1 hunks)meta/bio/salmon_tximport/test/resources/transcriptome.fasta.fai(1 hunks)test_wrappers.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
test_wrappers.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: docs
- GitHub Check: testing
- GitHub Check: Summary
🔇 Additional comments (7)
meta/bio/salmon_tximport/test/resources/transcriptome.fasta.fai (1)
1-2: LGTM: Test resource file is correctly formatted.The FASTA index file follows the standard
.faiformat and provides appropriate test data for the pathvars-based configuration system.meta/bio/salmon_tximport/test/resources/genome.fasta.fai (1)
1-1: LGTM: Genome index file is correctly formatted.The FASTA index file is properly structured and aligns with the new pathvars configuration for test resources.
meta/bio/salmon_tximport/test/config.yaml (1)
1-9: LGTM: Test configuration properly defines pathvars mappings.The pathvars configuration aligns with the schema defined in
meta.yamland provides appropriate test resource paths.test_wrappers.py (1)
1104-1104: LGTM: Test expectation updated to match new pathvars structure.The path correctly reflects the new results-prefixed output location defined in the meta-wrapper.
meta/bio/salmon_tximport/test/resources/transcriptome.fasta (1)
1-5: LGTM: Test sequences updated appropriately.The FASTA sequences have been updated and their lengths match the corresponding
.faiindex file entries (transcript1: 49bp, transcript2: 81bp).meta/bio/salmon_tximport/test/Snakefile (1)
6-13: LGTM: Configuration properly integrated into test workflow.The addition of
configfileand the block-stylemeta_wrapperwith explicitconfigsection correctly enables the pathvars-based configuration system.meta/bio/salmon_tximport/meta_wrapper.smk (1)
1-89: LGTM: Placeholder-based parameterization for salmon rules is well-implemented.The
salmon_decoy_sequences,salmon_index_gentrome, andsalmon_quant_readsrules correctly use pathvars placeholders (<transcriptome_sequence>,<genome_sequence>,<resources>,<logs>,<reads_r1>,<reads_r2>,<genome_annotation>,<per>) to enable user configuration without code modification.
| rule tximport: | ||
| input: | ||
| quant=expand( | ||
| "pseudo_mapping/{sample}/quant.sf", sample=["S1", "S2", "S3", "S4"] | ||
| "<results>/pseudo_mapping/{sample}/quant.sf", | ||
| sample=["S1", "S2"], | ||
| ), | ||
| lib=expand( | ||
| "pseudo_mapping/{sample}/lib_format_counts.json", | ||
| sample=["S1", "S2", "S3", "S4"], | ||
| "<results>/pseudo_mapping/{sample}/lib_format_counts.json", | ||
| sample=["S1", "S2"], | ||
| ), | ||
| aux_info=expand( | ||
| "pseudo_mapping/{sample}/aux_info", sample=["S1", "S2", "S3", "S4"] | ||
| "<results>/pseudo_mapping/{sample}/aux_info", | ||
| sample=["S1", "S2"], | ||
| ), | ||
| cmd_info=expand( | ||
| "pseudo_mapping/{sample}/cmd_info.json", sample=["S1", "S2", "S3", "S4"] | ||
| "<results>/pseudo_mapping/{sample}/cmd_info.json", | ||
| sample=["S1", "S2"], | ||
| ), | ||
| libparams=expand( | ||
| "pseudo_mapping/{sample}/libParams", sample=["S1", "S2", "S3", "S4"] | ||
| "<results>/pseudo_mapping/{sample}/libParams", | ||
| sample=["S1", "S2"], | ||
| ), | ||
| logs=expand("pseudo_mapping/{sample}/logs", sample=["S1", "S2", "S3", "S4"]), | ||
| tx_to_gene="resources/tx2gene.tsv", | ||
| logs=expand( | ||
| "<results>/pseudo_mapping/{sample}/logs", | ||
| sample=["S1", "S2"], | ||
| ), | ||
| tx_to_gene="<tx_to_gene>", | ||
| output: | ||
| txi="tximport/SummarizedExperimentObject.RDS", | ||
| txi="<results>/tximport/SummarizedExperimentObject.RDS", | ||
| params: | ||
| extra="type='salmon'", | ||
| log: | ||
| "logs/tximport.log" | ||
| "<logs>/tximport.log", | ||
| wrapper: | ||
| "master/bio/tximport" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Hardcoded sample list prevents user configuration.
The tximport rule hardcodes the sample list as ["S1", "S2"] in all six expand() calls (lines 95, 99, 103, 107, 111, 115). This is the exact issue mentioned in the PR description: "the meta-wrapper will not function unless users modify the last rule; currently samples are still listed within the expand function."
This defeats the purpose of the pathvars system, as users must edit the wrapper code to use their own samples.
Suggested solution:
- Add a
samplespathvar tometa.yaml:
custom:
samples: List of sample identifiers to process (e.g., ["sample1", "sample2"])- In
meta_wrapper.smk, replace the hardcoded lists with a placeholder or configuration reference:
# Option 1: Use a placeholder (if Snakemake supports this for lists)
sample=<samples>,
# Option 2: Reference from config (more traditional approach)
sample=config.get("samples", ["S1", "S2"]),- Update
test/config.yaml:
pathvars:
samples: ["S1", "S2"]This would allow users to configure samples in their config file without touching the wrapper code.
Do you want me to help generate a solution that makes the sample list configurable through pathvars or config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, pathvars only expand to strings. One could use config["pathvars"]["samples"] to get past this limitation, but it does not comply with the way pathvars were designed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
Snakemake pathvars expand sample list configuration
💡 Result:
Short answer + examples.
- Put samples (or a samples file path) in config.yaml and load it:
config.yaml:
samples: ["s1","s2","s3"]
data_dir: "data"
Snakefile:
configfile: "config.yaml"
rule all:
input:
expand("{dir}/{sample}.fastq.gz", dir=config["data_dir"], sample=config["samples"])
This uses expand with config lists and path variables. [1][2]
- If your config holds per-sample paths (dict), use them directly or via an input function (deferred evaluation) so wildcards resolve at DAG time:
config.yaml:
samples:
s1: "data/s1_R1.fastq.gz"
s2: "data/s2_R1.fastq.gz"
Snakefile:
configfile: "config.yaml"
rule map:
input:
lambda wildcards: config["samples"][wildcards.sample]
output:
"mapped/{sample}.bam"
shell:
"mapper {input} > {output}"
Use an input function when inputs depend on a job's wildcard values. [4]
- If you store only a samples TSV, read it and use its index or column:
config.yaml:
samples: "samples.tsv"
Snakefile:
import pandas as pd
configfile: "config.yaml"
samples = pd.read_table(config["samples"]).set_index("sample").index.tolist()
rule all:
input: expand("qc/{sample}_qc.html", sample=samples)
Snakemake docs show using config to list samples and validating sample sheets. [1][3]
References
[1] Snakemake configuration docs (examples with expand and config).
[2] Snakemake examples/tutorials showing expand usage with config lists.
[3] Using a samples TSV / reading config-specified sample sheets.
[4] Deferred input functions explanation (use lambda or def for inputs that depend on wildcards).
@tdayris, thank you for clarifying the pathvars limitation! That makes sense—pathvars are designed for path string substitution, not data structures.
Given that constraint, the standard Snakemake approach would be to keep the sample list in config (outside of pathvars) and reference it directly in the expand() calls. Here's the recommended solution:
1. In meta.yaml, document that samples should be provided in config:
# Add a comment in the documentation section
# Note: Sample list should be provided via config["samples"] as pathvars only support strings2. In meta_wrapper.smk, replace the hardcoded sample list with a config reference:
rule tximport:
input:
quant=expand(
"<results>/pseudo_mapping/{sample}/quant.sf",
sample=config.get("samples", ["S1", "S2"]),
),
lib=expand(
"<results>/pseudo_mapping/{sample}/lib_format_counts.json",
sample=config.get("samples", ["S1", "S2"]),
),
aux_info=expand(
"<results>/pseudo_mapping/{sample}/aux_info",
sample=config.get("samples", ["S1", "S2"]),
),
cmd_info=expand(
"<results>/pseudo_mapping/{sample}/cmd_info.json",
sample=config.get("samples", ["S1", "S2"]),
),
libparams=expand(
"<results>/pseudo_mapping/{sample}/libParams",
sample=config.get("samples", ["S1", "S2"]),
),
logs=expand(
"<results>/pseudo_mapping/{sample}/logs",
sample=config.get("samples", ["S1", "S2"]),
),
tx_to_gene="<tx_to_gene>",3. In test/config.yaml, add the samples list outside of pathvars:
samples: ["S1", "S2"]
pathvars:
# existing pathvars...This approach:
- ✅ Complies with pathvars design (strings only)
- ✅ Follows standard Snakemake patterns for sample lists
- ✅ Allows users to configure samples in their config file without editing wrapper code
- ✅ Provides sensible defaults for testing
Would you like me to help refine this solution further?
🐰📝
✏️ Learnings added
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 4665
File: meta/bio/salmon_tximport/meta_wrapper.smk:91-125
Timestamp: 2025-10-31T14:56:28.220Z
Learning: In Snakemake wrappers, pathvars only expand to strings, not lists or other data structures. Sample lists should be provided via config["samples"] rather than through pathvars, following standard Snakemake configuration patterns.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 4285
File: bio/agat/wrapper.py:336-340
Timestamp: 2025-06-20T13:10:29.240Z
Learning: User tdayris prefers to fix identified code issues immediately rather than deferring them, as indicated by their comment "Now is better than later" when confirming code fixes.
| pathvars: | ||
| default: | ||
| - results | ||
| - resources | ||
| - logs | ||
| custom: | ||
| transcriptome_sequence: Path to FASTA file with transcriptome sequence | ||
| genome_sequence: Path to FASTA file with genome sequence | ||
| genome_annotation: Path to GTF file with genome annotation | ||
| reads_r1: Path/pattern for FASTQ files with R1 reads | ||
| reads_r2: Path/pattern for FASTQ files with R2 reads | ||
| tx_to_gene: Path to TSV table with transcript_id <tab> gene_id | ||
| per: Pattern for sample identifiers, e.g. ``"{sample}"`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Pathvars structure is well-defined, but sample list configuration is missing.
The pathvars section properly defines default directories and custom path mappings. However, there's no pathvar or configuration mechanism for users to specify the list of samples to process. Users would need to know to look at the meta_wrapper.smk file and modify the hardcoded sample list in the tximport rule (currently ["S1", "S2"]), which is not user-friendly.
Consider adding a pathvar or configuration parameter for the sample list, such as:
samples: List of sample identifiers to process
This would allow users to specify their samples in the config file rather than modifying the wrapper code.
🏁 Script executed:
#!/bin/bash
# Inspect the complete meta.yaml file to verify structure
cat meta/bio/salmon_tximport/meta.yamlLength of output: 1769
🏁 Script executed:
#!/bin/bash
# Find and inspect the meta_wrapper.smk file
find . -name "meta_wrapper.smk" -o -name "*wrapper*" -type f | grep -i salmon | head -5Length of output: 180
🏁 Script executed:
#!/bin/bash
# Search for sample list configuration patterns in salmon_tximport directory
find meta/bio/salmon_tximport -type f \( -name "*.smk" -o -name "*.yaml" -o -name "*.yml" -o -name "*.json" \) -exec ls -la {} \;Length of output: 355
🏁 Script executed:
#!/bin/bash
# Check the salmon_tximport directory structure
ls -la meta/bio/salmon_tximport/Length of output: 364
🏁 Script executed:
#!/bin/bash
# Inspect meta_wrapper.smk to find sample list definitions
cat meta/bio/salmon_tximport/meta_wrapper.smkLength of output: 3592
🏁 Script executed:
#!/bin/bash
# Inspect test config to see how samples are currently configured
cat meta/bio/salmon_tximport/test/config.yamlLength of output: 316
🏁 Script executed:
#!/bin/bash
# Check if there are other meta wrappers to understand pattern for samples configuration
find meta/bio -name "meta.yaml" -exec grep -l "pathvars" {} \; | head -3 | xargs -I {} sh -c 'echo "=== {} ===" && cat {}'Length of output: 4314
Hardcoded sample list in tximport rule prevents user configuration.
The tximport rule in meta_wrapper.smk (lines 108-127) hardcodes samples as ["S1", "S2"] in all expand() calls. Users cannot customize the sample list without modifying the wrapper code itself, which defeats the purpose of configurable meta-wrappers.
The test config demonstrates pathvar configuration for file paths and patterns (including per: "{sample}"), but provides no mechanism to specify the actual list of samples to process.
Add a configuration parameter for samples (either as a pathvar or config section) to allow users to specify their samples, such as:
samples: ["sample1", "sample2", "sample3"]Then update the tximport rule to read this from config rather than hardcoding.
🤖 Prompt for AI Agents
In meta/bio/salmon_tximport/meta.yaml around lines 21 to 33, add a configurable
samples parameter (e.g. samples: ["sample1","sample2"]) so users can supply the
sample list instead of it being hardcoded; then update the tximport rule in
meta_wrapper.smk (around lines 108-127) to read the sample list from config
(config["samples"] or config.get("samples", [])) and use that value in all
expand() calls (with a sensible fallback or validation error if missing) rather
than the current hardcoded ["S1","S2"].
This PR tries to make salmon-tximport meta-wrapper accessible through
pathvars. Yet, I'd like to get some advice on how to proceed smoothly with theexpandkeyword and thepathvars. But the meta-wrapper wouldn't work if user does not modify the last rule : for now, I still list samples within theexpandfunction.QC
snakemake-wrappers.While the contributions guidelines are more extensive, please particularly ensure that:
test.pywas updated to call any added or updated example rules in aSnakefileinput:andoutput:file paths in the rules can be chosen arbitrarilyinput:oroutput:)tempfile.gettempdir()points tometa.yamlcontains a link to the documentation of the respective tool or command underurl:Summary by CodeRabbit
Release Notes